-
Notifications
You must be signed in to change notification settings - Fork 3
Adding warning logs for potentially unsafe masking/unmasking calls #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ptbank_filter_count_01
| "MASK": set(), | ||
| "UNMASK": {"CRBNK.ACCOUNTS.a_key"}, | ||
| }, | ||
| id="cryptbank_bubbleprop_01", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These "bubbleprop" tests all serve a specific purpose for the warning tracking: they all cover different kinds of situations where an expression is defined in one place that depends on unmask calls, gets propagated further along in the plan, then later gets used in a sensitive manner. Specifically, all of them are a window function where the inputs are sensitive columns (sometimes transformed by more functions), the output sometimes gets transformed further, and the window function eventually gets used:
- By nothing sensitive (so no warnings related to the window function)
- As part of filter condition
- As part of a join condition
- As part of an ordering key in a limit
- As part of an ordering key in the root
- As an aggregation key
| AGGREGATE(keys={'bucket': ROUND(cumavg / 10.0:numeric, 0:numeric) * 10:numeric}, aggregations={'n_rows': COUNT()}) | ||
| JOIN(condition=t0.t_sourceaccount == UNMASK::(CASE WHEN [t1.a_key] = 0 THEN 0 ELSE (CASE WHEN [t1.a_key] > 0 THEN 1 ELSE -1 END) * CAST(SUBSTRING([t1.a_key], 1 + INSTR([t1.a_key], '-'), LENGTH([t1.a_key]) / 2) AS INTEGER) END), type=INNER, cardinality=SINGULAR_FILTER, reverse_cardinality=PLURAL_FILTER, columns={'cumavg': t0.cumavg}) | ||
| PROJECT(columns={'cumavg': RELAVG(args=[SQRT(UNMASK::((1025.67 - ([t_amount]))))], partition=[a_branchkey], order=[(UNMASK::(DATETIME([t_ts], '+54321 seconds'))):asc_last], cumulative=True), 't_sourceaccount': t_sourceaccount}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we can see a good instance of the propagation required to correctly identify some of the critical dependencies:
- Unmask calls to
t_amountandt_tsare used inside theRELAVGandSQRTcalls used to derivecumavg cumavgis bubbled upward through the join (which depends on unmaskinga_key)- The bubbled value of
cumavgis then divided by 10, rounded, multiplied by 10, then used as an aggregation key -> therefore the ability to unmaskt_amountandt_tsare both critical dependencies.
Therefore, we will have warning logs about the safety of unmasking t_ts, t_amount, and a_key.
However, in bubbleprop_01, we only have warning logs for a_key. Even though 01 still has the same window function & join, the value of cumavg never gets used in a sensitive manner, so if the user has bad permissions then the output will be garbage w/o necessarily being malformed (we can change this definition if we need to).
hadia206
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
| self.stack: list[set[tuple[str, bool]]] = [] | ||
|
|
||
| def reset(self) -> None: | ||
| self.stack = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it also reset self.input_dependencies?
| batch_requests_made: list[set[str]] = extract_batch_requests_from_logs( | ||
| caplog.text | ||
| ) | ||
| batch_requests_made: list[set[str]] = extract_batch_requests_from_logs(caplog.text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why remove with redirect_stdout here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it shouldn't be around this line, it should be around the to_sql part.
… the name range tests
Adds a relational visitor pass at the end of the relational optimizer designed to traverse the final relational tree and identify any mask/unmask calls that would cause a critical logical error if the user does not have permission. The pass finds all mask/unmask calls that, at some point in the plan, are used in a sensitive manner so one of the following will happen if the user cannot make the mask/unmask plan:
Those three types of critical errors will happen if the mask/unmask call without sufficient permissions is eventually used in one of the following:
If the values are garbled in one of the following ways, it is not considered a critical failure unless the garbled value is used in one of the sensitive manners described above:
Since PyDough does not (currently) have a way to determine whether a user has permission to mask/unmask a column, the visitor will identify all sensitive dependencies that would cause a critical error if the user does not have permission. Then, it will dump all of these dependencies via logger warnings in the following format (per column):